Novel Scoring Scale for Quality Assessment of Lung Ultrasound in the Emergency Department

Introduction The use of a reliable scoring system for quality assessment (QA) is imperative to limit inconsistencies in measuring ultrasound acquisition skills. The current grading scale used for QA endorsed by the American College of Emergency Physicians (ACEP) is non-specific, applies irrespective of the type of study performed, and has not been rigorously validated. Our goal in this study was to determine whether a succinct, organ-specific grading scale designed for lung-specific QA would be more precise with better interobserver agreement. Methods This was a prospective validation study of an objective QA scale for lung ultrasound (LUS) in the emergency department. We identified the first 100 LUS performed in normal clinical practice in the year 2020. Four reviewers at an urban academic center who were either emergency ultrasound fellowship-trained or current fellows with at least six months of QA experience scored each study, resulting in a total of 400. The primary outcome was the level of agreement between the reviewers. Our secondary outcome was the variability of the scores given to the studies. For the agreement between reviewers, we computed the intraclass correlation coefficient (ICC) based on a two-way random-effect model with a single rater for each grading scale. We generated 10,000 bootstrapped ICCs to construct 95% confidence intervals (CI) for both grading systems. A two-sided one-sample t-test was used to determine whether there were differences in the bootstrapped ICCs between the two grading systems. Results The ICC between reviewers was 0.552 (95% CI 0.40–0.68) for the ACEP grading scale and 0.703 (95% CI 0.59–0.79) for the novel grading scale (P < 0.001), indicating significantly more interobserver agreement using the novel scale compared to the ACEP scale. The variance of scores was similar (0.93 and 0.92 for the novel and ACEP scales, respectively). Conclusion We found an increased interobserver agreement between reviewers when using the novel, organ-specific scale when compared with the ACEP grading scale. Increased consistency in feedback based on objective criteria directed to the specific, targeted organ provides an opportunity to enhance learner education and satisfaction with their ultrasound education.


INTRODUCTION
Lung ultrasound (LUS) is frequently used in the emergency department (ED) to assess both medical and trauma patients. 1,2Quality assessment (QA) of ultrasound images is one of the six required elements of diagnostic ultrasound per the American College of Emergency Physicians (ACEP) and is routinely performed to evaluate image quality, ensuring appropriate patient care, and enabling reviewers to assess user performance. 2The use of a reliable scoring system for QA is imperative Western Journal of Emergency Medicine Volume 25, No. 2: March 2024 264

BRIEF RESEARCH REPORT
to limit inconsistencies in measuring ultrasound acquisition skills.
The current QA grading scale endorsed by ACEP was developed from a consensus report of emergency ultrasound leaders to provide a systematic method to report and communicate ultrasound findings. 2It is a non-specific scale that applies irrespective of the type of study performed and has not been rigorously validated.3][4][5] Alternative LUS assessment tools have been developed; however, they are extensive and as such impractical for routine QA use or are focused on image acquisition skills and not tailored to anatomic feedback. 6,7Our goal in this study was to determine whether a succinct, organ-specific grading scale designed for QA would be more precise with better interobserver agreement.

METHODS
This was a prospective validation study of an objective QA scale for LUS.We developed a novel, lung-specific grading scale by a rigorous review of expert, published experience at an outside, unaffiliated institution (Scripps Mercy Hospital, San Diego, CA).][9][10][11][12][13] In the expert review, the current available, organ-specific grading scale found in the literature was modified to the anatomy of the chest wall. 3,5][9][10][11][12][13] The use of four critical landmarks-rib shadows, pleural line, A/B lines, and technical flaws-were recognized as commonalities in all published images in LUS studies, including expert consensus. 14,15We, therefore, divided these landmarks into a point scale that progressively defines the pattern of acquisition required to obtain an image (ie, bones first, pleural line, followed by artifacts).We described technical flaws as non-optimized depth/gain, distracting adjacent structures, inadequate axis, or hand movement.We deemed flaws to be major if they were present to a degree significant enough to decrease diagnostic capabilities, or if multiple flaws were present.
The scale was then validated at an urban academic tertiary care center in Richmond, Virginia.We identified the first 100 LUS studies completed as part of regular clinical practice in the ED by emergency physicians with two or more LUS videos performed in the year 2020.Dedicated thoracic ultrasound examinations are in general performed by resident physicians with attending oversight.Studies were obtained using Sonosite X Porte ultrasound machine (Fujifilm Sonosite, Bethell, WA) using either the C60XP 5-2-MHz curvilinear transducer, L25 13-6-MHz linear array transducer or the P19 5-1-MHz phased array probe.Four reviewers who were either emergency ultrasound fellowshiptrained or current fellows with at least six months of QA experience scored each of the 100 studies resulting in a total of 400.Two blinded reviewers used the current ACEP grading scale, 2 and two used a novel lung-specific grading scale; there was one fellow and one ultrasound-trained physician in each group (Figure).The primary outcome was the level of agreement between the reviewers, indicating the reliability of the scoring system.Our secondary outcome was the variability of the scores given to the studies.For the agreement between reviewers, we computed the intraclass correlation coefficient (ICC) based on two-way randomeffect model with a single rater for each grading scale.Ten thousand bootstrapped ICCs were generated to construct 95% confidence intervals (CI) for both grading systems.We used a two-sided one-sample t-test to determine whether there were differences in the bootstrapped ICCs between the two grading systems.

RESULTS
The first 100 LUS studies completed in the ED by emergency medicine residents (postgraduate year [PGY]-1,

Population Health Research Capsule
What do we already know about this issue?A reliable method of quality assessment (QA) of ultrasound images is imperative to assess user performance and limit inconsistencies in measuring ultrasound acquisition skills.
What was the research question?Is there a QA scoring scale for lung ultrasound (LUS) that is more precise than the commonly used ACEP scoring scale?
What was the major finding of the study?In the QA of LUS, a novel scoring scale showed significantly more interobserver agreement compared to the ACEP scale.

How does this improve population health?
A more individualized scoring scale for QA of LUS results in less grading variance and more objective feedback when compared to the ACEP scale.

DISCUSSION
The current ACEP grading scale used for QA was developed from a consensus report of emergency ultrasound leaders but has not been systematically validated. 2The use of a reliable, validated scoring system for QA is imperative to limit inconsistencies and ensure objectivity in measuring ultrasound acquisition skill.The vague language used in the ACEP scale may contribute to variable interpretation by those assessing studies, leading to discrepancies in grading ultrasound skill.Inconsistent feedback may confuse the learner and hinder growth of technical skill.In our study, we found that there was an increased interobserver agreement between reviewers when using the novel, organ-specific scale when compared with the ACEP grading scale.Increased consistency in feedback, combined with directed feedback to the specific targeted organ, provides an opportunity to enhance learner education and satisfaction with their ultrasound education.
4][5] This is thought to be due in part to the complexity of these scales and/or that they were validated outside the ED, limiting the external validity. 3,4,6,7e sought to develop a scale that was concise, organ-specific, and applicable to the most common setting in which LUS is performed.To improve such vague language as "all structures imaged well," we found benefit in specifically stating the anatomic landmarks needed to maximize diagnostic imaging in each view.By emphasizing proper imaging technique before diagnostic interpretation, our assessment tool may improve errors in image grading and reduce learner feedback variability.

LIMITATIONS
Our study was limited by its evaluation of a QA experience at a single, academic tertiary-care center in which the validation took place.Patient demographics were not collected.The blinded reviewers all trained (or current trainees) at the same clinical ultrasound fellowship and, therefore, were taught to perform QA using the ACEP grading scale in a similar manner.Interestingly, this perhaps may have contributed to a higher agreement with the ACEP scale than if, alternatively, reviewers had trained at different institutions.Further, the scale itself was developed after an extensive review of the literature, customized into a feasible scale that is directly applicable to learner objectives.As such, this scale lacks the rigor of alternative methodological methods such as modified Delphi analysis.Importantly, this scale did not validate whether the score was related to the diagnosis or outcome, or whether it improved QA efficiency or educational feedback, but rather the degree of agreement.Additionally, our scale focuses on pathology related to the pleural line itself and does not include language to assess the ability to diagnose a pleural effusion.Finally, our study

CONCLUSION
We found that a more individualized quality assessment scale of ultrasound imaging targeted to a specific organ-in this case the lung-results in less grading variance and more consistent, objective feedback.This finding may have implications on knowledge gained and learner satisfaction.Future studies are warranted prior to the adoption of this novel scale in clinical practice.
Address for Correspondence: Jessica R. Balderston, MD, VCU Medical Center, Department of Emergency Medicine, 1250 E. Marshall St, Box 980401, Richmond, VA 23298-0401.Email: Jessica.balderston@vcuhealth.orgConflicts of Interest: By the WestJEM article submission agreement, all authors are required to disclose all affiliations, funding sources and financial or management relationships that could be perceived as potential sources of bias.No author has professional or financial relationships with any companies that are relevant to this study.There are no conflicts of interest or sources of funding to declare.Copyright: © 2024 Balderson et al.This is an open access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) License.See: http://creativecommons.org/ licenses/by/4.0/

Table .
Summary table of scoring systems.Novel Scoring Scale for QA of LUS in the ED Balderston et al.involved reviewers with six months experience in QA and included a small (100) number of studies; consequently, our results may be understated.Further research is warranted to validate this novel scale, investigate learner satisfaction, and assess its impact on educational enhancement.